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Abstract. The subjects and key questions faced by computational astro- 
physics using A^-body simulations are discussed in the fields of globular star 
cluster dynamics, galactic nuclei and cosmological structure formation. Af- 
ter a comparison of the relevance of different A^-body algorithms a new 
concept for a more fiexible customized special purpose computer based 
on a combination of GRAPE and FPGA is proposed. It is an ideal ma- 
chine for all kinds of A^-body simulations using neighbour schemes, as the 
Ahmad-Cohen direct A^-body codes and smoothed particle hydrodynamics 
for systems including gas. 



1. Introduction 

Gravitation is the only one of the four fundamental forces in physics which 
cannot be shielded by particles of opposite charge and whose force at long 
range interactions does not have any (e.g. exponential) cutoff; the gravita- 
tional force is described by Newton's inverse square force law to the largest 
possible ranges in which classical mechanics applies. So it does not have 
a preferred scale and as a consequence gravitational forces play an impor- 
tant role for the dynamical evolution of astrophysical systems on practi- 
cally all scales. Among the most challenging problems which have been 
treated by purely gravitational A^-body simulations are structure forma- 
tion in the universe, evolution of galactic nuclei and globular star clusters. 
Even if non-gravitative effects become important, like hydrodynamics or 
magnetohydrodynamic forces (e.g. for star formation or dissipative galaxy 
formation) some of the methods to solve the relevant dynamical equations 
represent the gas by particles interacting by a combination of gravitational 
and non-gravitative forces (smoothed particle hydrodynamics, SPH, Lucy 
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1977, Gingold & Monaghan 1977). Thus, simulations using particles to fol- 
low the dynamical evolution of astrophysical systems are one of the most 
important tools in computational astrophysics and have become a third in- 
dependent experimental field of astrophysical research besides theory and 
observations. 

This paper is organized as follows. First we present some key questions 

which arc being addressed by present or future direct A^-body simulations in 
the fields of globular star clusters, galactic nuclei, and cosmological models 
of structure formation (Sect. 2). Second, the main algorithms for astrophys- 
ical iV-body simulations arc briefly introduced and discussed in comparison 
with each other (Sect. 3). In Sect. 4 their implementation on special pur- 
pose hardware is discussed and a new concept for faster and more flexible 
hardware tailored to various kinds of direct A/"-body simulations including 
gas dynamics is presented. 

2. Some Astrophysical Key Questions 

2.1. GLOBULAR STAR CLUSTERS 

There is an excellent review of the internal dynamics of globular star clus- 
ters (Meylan & Heggie 1997). Here we only want to stress some selected 
topics which are relevant to the subject of AT-body simulations. Globular 
star clusters are nicely spherical (sometimes slightly flattened) star clusters 
consisting of 10® - 10® stars. Since their escape velocity is small compared 
to the typical velocities of stellar winds and explosions they are practically 
gas-free. They are ideal stellar dynamical laboratories because their relevant 
thermal (two-body relaxation) and dynamical timcscales are smaller than 
their lifetime. Globular cluster systems exist around many other galaxies 
as well (see e.g. Forbes, Brodie & Grillmair 1997). 

Due to their relative isolation, lack of observable interstellar gas and 
due to their symmetry globular clusters are well approximated by simpli- 
fied theoretical models. Since the relaxation timescale is long compared to 
the dynamical time they develop through a sequence of dynamical (virial) 
equilibria. The fundamental kinetic equation in such case is the Fokker- 
Planck equation. Inspired by plasma physical work the use of this equation 
for stellar dynamics goes back to Cohn (1980). Recent models of that type 
include the effects of anisotropy (differences between radial and tangen- 
tial velocity dispersion, which can be present even in spherical systems, 
Takahashi 1996, 1997). Another improvement includes for the first time 
the effect of rotation for those of the globular clusters which are slightly 
flattened (Einsel &; Spurzem 1997). Also anisotropic gaseous models based 
on a moment evaluation of the Fokker-Planck equation were successfully 
used (Louis & Spurzem 1991, Spurzem 1994). 
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In the presence of self-gravitation many concepts of thermodynamics, 
however, have to be used with care. So, for example, there is no global 
thermodynamic equilibrium, because a system of gravitating point masses 
can always achieve infinite amounts of binding energy just by moving two 
or more or all of the particles closer and closer together. At some limiting 
central velocity dispersion general relativity takes over and the cluster col- 
lapses by a dynamical instability towards a black hole (Shapiro &: Teukolsky 
1985). Before reaching that limit most realistic astrophysical systems, how- 
ever, will reach the limit of physical collisions and merging of their stars. 
Another alternative is that strong two-body correlations (close binaries) 
form which subsequently stop the global gravitational collapse by supere- 
lastic scatterings with field stars (Bettwieser & Sugimoto 1984). If there 
are too many binaries, however, the fundamental assumption of using the 
one-particle distribution function and the Fokker-Planck equation breaks 
down. 

The second obstacle of thermodynamic methods to treat astrophysical 
ensembles of particles is simply that particle numbers are not large enough. 
Therefore stochastic fluctuations, deviations from thermodynamic expecta- 
tion values in individual representations of e.g. star clusters are much more 
significant than in any laboratory gas; the amplitude of such fluctuations 
can be of a size comparable to that of the observed quantity. Hence it is 
by no means guaranteed, that a given individual globular cluster consisting 
of some one million or less particles strictly evolves according to mod- 
els derived from statistical mechanics. As a consequence Giersz & Heggie 
(1994a,b, 1996, 1997) in their seminal series of papers compared the re- 
sults of statistical models based on the Fokker-Planck approximation with 
ensemble averages of a number of statistically independent direct A^-body 
simulations. Prom this and similar work (Spurzem Sz Aarseth 1996, Makino 
1996) one can conclude that in spherical isolated clusters statistical models 
in spherical symmetry with standard two-body relaxation work fairly well, 
but already in the case of a galactic tidal field with an enhanced mass loss by 
stellar escapers, severe problems occur in understanding the results of the 
direct A^-body models and their relation to the results of the Fokker-Planck 
results (Heggie 1997, this volume). Note that such problems occur for one 
of the still most simplified globular cluster models; no rotation and no mass 
loss by stellar evolution was included, stars were considered as point masses 
and no effects of binary stellar evolution taken into account, no primordial 
binaries present, no time-dependent three-dimensional tidal field, and so 
on. Consequences from that are two-fold: first great care should be taken 
advancing Fokker-Planck and gaseous models to more complicated situa- 
tions, second direct A/^-body models should be seen either as a theoretical 
tool to check and gauge the statistical models (eventually after a process 
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of ensemble averaging) or they should employ a realistic particle number 
to directly model an individual, real star cluster. Besides the questions of 
gravothcrmal oscillations in very large A^-body systems and the scalability 
of cluster models in a galactic tidal field, which are treated elsewhere, we 
would like to stress here the importance to acquire information on the pre- 
and post-collapse evolution of A/"-body models of rotating globular clusters, 
for which very little is known yet. 

From the previous paragraphs it should be clear, that only an exact, 
direct A^-body integrator should be used for problems of globular cluster 
dynamics, which treats the two-body relaxation by small angle gravitative 
encounters of all impact parameters with a maximal accuracy and simul- 
taneously, accurately and efficiently follows the formation and evolution of 
very close binaries, whose timescales differ by many orders of magnitude 
from the dynamical timescalc of the whole cluster. Such requirements are 
fulfilled by Nbody5 (Aarseth 1985) and its successors (see Sect. 3). As a final 
remark for this subsection, we stress that in very young stellar clusters, like 
newly forming globular clusters seen around merging galaxies (Schweizer et 
al. 1996), the timescales to deplete the cluster from remaining protostellar 
gas by stellar winds, for star formation and evolution of massive stars, are 
comparable to the dynamical time of the cluster. Since mass segregation by 
two-body relaxation can be faster than the standard relaxation by a fac- 
tor of M/m, where M are the most massive species, and m is the average 
particle mass in the cluster, even two-body effects are not completely negli- 
gible at cluster formation. Modelling such situation in a context of cooling 
and fragmentation of a gas cloud (Murray & Lin 1996) including stellar 
dynamical effects would require an highly accurate A'^-body integrator in 
dynamical coupling with a gaseous component. 

2.2. GALACTIC NUCLEI 

Another long-standing problem of collisional stellar dynamics is the ques- 
tion of the equilibrium system and dynamical evolution of a cusp of stars 
surrounding a massive central black hole. Such massive black holes are 
very likely to reside in the centres of galaxies as a fossile of earlier acticity 
(Kormendy & Richstone 1995). Their formation as a result of collision- 
less dynamical general relativistic collapse and dissipative processes dur- 
ing galaxy formation is very likely but not yet fully understood (Quinlan 
&; Shapiro 1990). Frank & Rees (1976) examined the interplay between 
mass and energy transport by two-body relaxation and loss-cone accre- 
tion of stars on orbits with low angular momentum by the black hole; their 
results were confirmed by Monte-Carlo numerical models of Marchant &; 
Shapiro (1980), later followed by multi-mass direct numerical solutions of 
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the ID Fokker-Planck equation for isotropic stellar cusps of Murphy, Cohn 
& Durisen (1991). Only recently the first self-consistent AT-body models of 
massive black holes including a sufficient number of stars in their surround- 
ing cusps were done by use of hybrid A^-body algorithms (Quinlan 1996, 
Quinlan &; Hernquist 1997) or a high-speed special purpose computer for 
a direct summation algorithm (Makino & Ebisuzaki 1996, Makino 1997). 
However, the latter work was occupied mainly with the question of dynam- 
ical friction of black hole binaries in a galactic nucleus after a merger event. 
Still the standard picture of Prank Sz Rees (1976) has not yet carefully been 
checked by using a direct full A/^-body simulation. It is not certain, whether 
the assumption that two-body relaxation dominates the evolution is cor- 
rect; it has been suggested that large angle close encounters of stars with 
each other and with the black hole compete with it, and that there may 
be non-standard relaxation processes at work (Ranch & Tremainc 1996). 
These are interesting open question to tackle with high accuracy pure par- 
ticle A''-body simulations. Even more important as in the case of globular 
clusters, however, are the possible effects of gas produced by stellar colli- 
sions, which can accumulate in the centre due to the much deeper central 
potential, and form new stars (Quinlan & Shapiro 1990, Rees 1997) We 
would like to conclude this subsection with the final remark that this is 
again a physical situation where highly accurate direct A'^-body models, 
eventually dynamically coupled with the dynamics of a gas component are 
very important for future understanding of such objects. 



2.3. COSMOLOGY AND STRUCTURE FORMATION 



In the standard paradigm of cosmological structure formation primordial 
quantum fluctuations grow gravitationally in a Tinivcrsc dominated by non- 
dissipative dark matter. In the non-linear regime the distribution of masses 
can be estimated by simple theory (Press &; Schechter 1974), later extended 
to A/"-body models (e.g. White et al. 1987, Navarro, Prenk & White 1997). 
On small scales gas physics, which (e.g. in the case of star formation) is 
only known approximately has to be included into the models (Steinmetz 
1996). Recently it has been shown, that softening of the gravitational po- 
tential, which was adopted in most of the models, causes spurious two- 
body relaxation effects (Steinmetz & White 1997). Consistently Moore et 
al. (1997) find that the structure of cold-dark-matter (CDM) haloes signifi- 
cantly changes if models with much higher resolution in particle number are 
used. Again, we want to conclude here that high resolution, high-accuracy 
A?^-body simulations, gravitationally coupled with a gas component are use- 
ful to study such questions. 
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3. Numerical Methods of A^-body integration 



TABLE 1. Algorithms for iV-body Simulations 



Number Name Scaling 
No PARTICLE-PARTICLE RELAXATION: 



1 PM - particle mesh 0{N) + 0{n^) 

2. Fast Multipole 0{N) + 0{nlm) 

3. Self Consistent Field 0{N) + 0{nlm) 

"Exact" : 

4. NbodyI - 4 0{N'^) 

5. NbodyS - 6 0{NNu) + 0{N'^) 

6. KiRA 

"Mixed" : 

7. Tree O(iVIogiV) 

8. Oiyl) + 0(\)0(nlin) 



In Table 1 an overview over the most commonly used present algorithms 
for direct A^-body simulations is given. The symbols used in the "Scaling" 
column denote the particle number N, a neighbour number Nn (compare 
Sect. 4), a grid resolution n or the number of terms nlm in a series evalu- 
ation of the gravitational potential. We want to comment only very briefly 
on each of the methods to give an overview for the reader. The first group 
has been labelled "no particle-particle relaxation" because it does not use 
direct gravitational forces between particles. The gravitational potential 
is computed from the particle configuration via an intermediate step, ei- 
ther through a mesh in coordinate space or an orthogonal function series. 
Reviews on classical particle mesh (PM) techniques can be found in Sell- 
wood (1987). "SuPERBOx" is a multi-grid method in a classical PM scheme 
suitable for high resolution problems and relaxing the inflexibility of con- 
ventional PM methods somehow (Madejsky &; Bien 1993). Fast multipole 
methods used e.g. by Greengard (1990) and Greengard & Rokhhn (1987) 
can only efficiently be used for codes using the same tinicstcp for parti- 
cles, which makes them unfeasible for astrophysical problems with grav- 
itating particles developing into highly structured and/or inhomogeneous 
states. Codes using an orthogonal series expansion (so-called "self consistent 
field" or SCF codes) have been introduced to the astrophysical community 
mainly by Hernquist &: Ostriker (1992), although there are earlier similar 
approaches (e.g. Clutton-Brock 1972). 
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In all cases where a highly accurate computation of the gravitational 
potential with all its graininess due to individual particles, responsible for 
various relaxation effects, is necessary, there is no way to avoid a direct 
brute-force summation algorithm, where individual pairwise forces are com- 
puted. Such approach goes back to Aarseth (1963) and von Hoerner (1960). 
Close encounters and the formation of binaries, whose binding energies are 
large compared to the thermal energy of the system have led to the inclu- 
sion of regularization in such codes (named NbodyI - 4, Aarseth 1996). In 
addition to an individual time step scheme and a high order time integrator 
some versions (named NbodyS, 6, Aarseth 1985, Makino & Aarseth 1992) 
also use an Ahmad-Cohen neighbour scheme to reduce the number of total 
force computations required. The algorithm is well parallelizable and has 
been implemented on general purpose parallel computers (Spurzem 1997). 
The KiRA code is a new development by Hut, McMillan &; Makino (cf. 
www . sns . ias . edu/st ar lab /star lab . html) . 

At the end of Table 1 there are the "mixed" codes; one is the TREE-Code 
(Barnes & Hut 1986), using particle-particle forces in principle; it groups, 
however, subsets of particles in some distance from a test particle together, 
taking only their centres of masses into account (and if required also some 
multipole moments of the mass distribution) . It is highly efficient for lumpy 
particle configurations, where the configuration has a small overall filling 
factor, and has been used very successfully for large-scale cosmological sim- 
ulations and models of merging galaxies, partly even including a gas com- 
ponent treated by SPH (just as examples of most recent work look at e.g. 
Dave, Dubinski & Hernquist 1997, Mihos, Dubinski & Hernquist 1997). 
Most TREE-Code implementations do not require very high accuracies, for 
example an energy error of up to a few percent is generally tolerated. En- 
forcing in a TREE-code very high accuracy as it is required for globular 
cluster models (10~^ %, a typical value achieved for direct Aarseth Nbody- 
integrators) leads to a very significant reduction of its efficiency (McMillan 
& Aarseth 1993). Finally, another "mixed" code also used especially for 
cosmological simulations with an SPH gas component is the P^M-code, for 
which we refer to a recent paper of Pearce & Couchman (1997). 

4. Hardware 

The construction of special-purpose hardware to compute gravitational 
forces in direct iV-body simulations was inspired by the fact that the total 
CPU time required for one time step of all particles scales as T = aN+(3N^, 
with some numerical time constants a and p. For large particle num- 
ber the second part (the pairwise force calculation) consumes most of 
the time (see Sugimoto et al. 1990). So our Japanese colleagues built the 
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GRAPE hardware, whose development culminated in the presentation of 
the GRAPE-4 Teraflop computer (Makino et al. 1997, see also the papers 
of Makino and Taiji in this volume). The latter could be so fast, that ac- 
cording to Amdahl's law the other parts of the code (e.g. to advance the 
particles) become the bottleneck of the simulation, especially if not just the 
simplest possible TV-body integrator is used, but a neighbour scheme like 
Nbody6 or, even worse, a code which also includes some gas dynamical SPH 
calculations (see e.g. Steinmetz 1996). A prototype timing formula for the 
CPU time required per timestep in such cases is T = aN + /3N'^ + SN ■ Nn, 
where 5 is another time constant and Nn a typical neighbour number, for 
which neighbour forces (in case of Nbody6) or gas dynamical forces (in case 
of SPH) are to be calculated on a test particle. We are proposing to use here 
reconfigurable hardware based on field-programmable gate arrays (FPGA) 
which has been developed at the University of Mannheim, Germany (cf. 
e.g. Hogl et al. 1995). The neighbour operations could be mapped onto 
the FPGA device and both this device and the GRAPE machine could be 
connected to a standard host workstation. In Fig. 1 the expected speed-up 
of such a configiuation (models E, G) compared to a standard configura- 
tion with GRAPE (models D, F) for such applications is depicted. Since 
the FPGA device is somewhat more flexible to code and adapt to different 
problems, and since such type of machine would be ideally suited for high 
accuracy gravitational force computations in an A^-body simulation with 
a gas component it is called AHA-GRAPE (Adaptive HydrodynAmics 
GRAPE). 



The data of Fig. 1 were obtained under a number of conservative and 
simplified assumptions, e.g. all effects of parallelization were neglected, host 
speeds of 50 Mflops, GRAPE speeds of 500 Gflops ("fast") and 50 Gflops 
("normal"), AHA (FPGA) speeds of 5 Gflops ("fast") and 500 Mflops ("nor- 
mal") were adopted. The bus connection from the host to AHA and GRAPE 
was assumed having a bandwidth of 100 MB/s ("fast") and 10 MB/s ("nor- 
mal"). The number of floating point operations assumed was 20 for a grav- 
itational force calculation on GRAPE, 100 for all neighbour operations per 
neighbour, and 100 to advance a particle by a high order integrator on its 
orbit. All adopted values arc considered as rather conservative, especially 
if the technological evolution of the next 5-10 years is allowed for. It is 
concluded that the overall speedup of such a combined machine is consid- 
erable, that it will be an ideal tool for high-accuracy A?^-body simulations 
including an SPH gas component, and that such a prototype will open up 
the road to very fast and flexible customized computers for astrophysical 
A?^-body simulations in the future. More details will be published elsewhere. 
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Figure 1. CPU time per step required for a simulation with direct gravitational force 
computation and neighbour scheme (SPH gas dynamics or Ahmad-Cohen A^-body code) 
as a function of particle number for the proposed AHA-GRAPE machine (E, G) and 
a standard GRAPE-host combination (D, F), for a "normal" (D, E) and "fast" (F, G) 
GRAPE. Details see main text. 
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